Model-Based Hierarchical Clustering
نویسندگان
چکیده
We present an approach to model-based hi erarchical clustering by formulating an ob jective function based on a Bayesian anal ysis. This model organizes the data into a cluster hierarchy while specifying a complex feature-set partitioning that is a key compo nent of our model. Features can have either a unique distribution in every cluster or a com mon distribution over some (or even all) of the clusters. The cluster subsets over which these features have such a common distribu tion correspond to the nodes (clusters) of the tree representing the hierarchy. We apply this general model to the problem of docu ment clustering for which we use a multino mial likelihood function and Dirichlet priors. Our algorithm consists of a two-stage pro cess wherein we first perform a flat clustering followed by a modified hierarchical agglom erative merging process that includes deter mining the features that will have common distributions over the merged clusters. The regularization induced by using the marginal likelihood automatically determines the op timal model structure including number of clusters, the depth of the tree and the subset of features to be modeled as having a com mon distribution at each node. We present experimental results on both synthetic data and a real document collection.
منابع مشابه
HIERARCHICAL DATA CLUSTERING MODEL FOR ANALYZING PASSENGERS’ TRIP IN HIGHWAYS
One of the most important issues in urban planning is developing sustainable public transportation. The basic condition for this purpose is analyzing current condition especially based on data. Data mining is a set of new techniques that are beyond statistical data analyzing. Clustering techniques is a subset of it that one of it’s techniques used for analyzing passengers’ trip. The result of...
متن کاملAssessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories
In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...
متن کاملHigh-Dimensional Unsupervised Active Learning Method
In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...
متن کاملApplication of 3D-QSAR on a Series of Potent P38-MAP Kinase Inhibitors
One of the most applied methods in drug industry for development of new drugs is 3D-QSAR methodology. As p38-mitogen-activated protein kinase (p38-MAPK) plays a crucial role in regulating the production of such proinflammatory cytokines as tumor necrosis factor-α (TNF-α) and interleukin-1, emerging as an attractive target for new anti-inflammatory agents, we used a 3D-QSAR based method of Compa...
متن کاملPRFM Model Developed for the Separation of Enterprise Customers Based on the Distribution Companies of Various Goods and Services
In this study, a new model of combining variables affecting the classification of customers is introduced which is based on a distribution system of goods and services. Given the problems that the RFM model has in various distribution systems, a new model for resolving these problems is presented. The core of this model is the older RFM. The new model that has been proposed as PRFM, consists of...
متن کاملروش نوین خوشهبندی ترکیبی با استفاده از سیستم ایمنی مصنوعی و سلسله مراتبی
Artificial immune system (AIS) is one of the most meta-heuristic algorithms to solve complex problems. With a large number of data, creating a rapid decision and stable results are the most challenging tasks due to the rapid variation in real world. Clustering technique is a possible solution for overcoming these problems. The goal of clustering analysis is to group similar objects. AIS algor...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000